Probing the Large-Scale Homogeneity of 
the Universe with Galaxy Redshift 

Surveys 



Cristiano Giovanni Sabiu, BSc 



Thesis 
submitted to the 
University of Glasgow 
for the degree of 
MSc 




UNIVERSITY 
of 

GLASGOW 



Astronomy & Astrophysics Group 
Department of Physics & Astronomy 
University of Glasgow 
Scotland 
September 2006 



(c) Cristiano Sabiu 2006 



dedicato alia 

mia nonna 



& 

al mio nonno 



Probing the Large-Scale Homogeneity of the 
Universe with Galaxy Redshift Surveys 



Cristiano Sabiu BSc 

Submitted for the degree of MSc 
September 2006 

Abstract 

Modern cosmological observations clearly reveal that the universe contains a hi- 
erarchy of clustering. However, recent surveys show a transition to homogeneity on 
large scales. The exact scale at which this transition occurs is still a topic of much 
debate. There has been much work done in trying to characterise the galaxy dis- 
tribution using multifractals. However, for a number of years the size, depth and 
accuracy of galaxy surveys was regarded as insufficient to give a definitive answer. 
One of the main problems which arises in a multifractal analysis is how to deal 
with observational selection effects: i. e. 'masks' in the survey region and a geometric 
boundary to the survey itself. 

In this thesis I will introduce a volume boundary correction which is rather similar 
to the approach developed by Pan and Coles in 2001, but which improves on their 
angular boundary correction in two important respects: firstly, our volume correction 
'throws away' fewer galaxies close the boundary of a given data set and secondly it 
is computationally more efficient. 

After application of our volume correction, I will then show how the underlying 
generalised dimensions of a given point set can be computed. 1 will apply this 
procedure to calculate the generalised fractal dimensions of both simulated fractal 
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point sets and mock galaxy surveys which mimic the properties of the recent IRAS 
PSCz catalogue. 
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Chapter 1 
Introduction 



This thesis is devoted to a study of the large scale structure (LSS) of the universe 
and as such belongs in the field of Cosmology. Cosmology is the study of structure 
and evolution in the universe. The main constituents of LSS are individual galaxies 
and clusters of galaxies up to Gigaparsec (Gpc) scales. 



1.1 Cosmology through the ages 



The ancient Greeks were undoubtedly the leaders in astronomical understanding 
of their time. Around the 4th century BC a general consensus emerged, from the 
combined ideas of many philosophers, including Plato and Aristotle, which put our 
spherical Earth at the centre of the universe. They speculated that the Sun, Moon 
and planets were carried around the Earth on concentric spheres, arranged: Moon, 
Sun, Venus, Mercury, Mars, Jupiter, Saturn and the fixed stars beyond. Aristotle 
was to later elaborate on this geocentric model by trying to explain the lunar cycle. 
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An updated geocentric theory of the heavens was put together by the astronomer 
Claudius Ptolemy from many works in Greek astronomy. His Ptolemaic model was 
penned in the 2nd century AD and stood as the standard theory for more than a 
millennium. Ptolemy made extensive use of epicycles to explain many aspects of 
planetary motion. In particular, his epicyclic explanation of retrograde motion in 
the planets helped elevate this theory to the forefront of astronomical thinking. This 
picture of our universe stood solid until the 16th century when Nicolaus Copernicus 
changed forever our view of the cosmos. 

Copernicus, the famous Polish astronomer, showed that a model, with the sun 
at its centre, could explain the motion of the planets in a very simple way, with no 
need for complicated orbits and epicycles. However this Heliocentric model was not 
new at all: its origins dated back many centuries BC to the workings of an Indian 
philosopher, Yajnavalkya. He had the vision to see that the sun, being the most 
important of the heavenly bodies, should be at the centre of our universe. However 
he lacked any real observational or scientific evidence. This Heliocentric idea was 
also present in ancient Greece, held strong by the Pythagoreans. The first to propose 
this was Aristarchus of Samos (c. 270 BC) and later Archimedes, the Greek scientist, 
was swayed by this line of thinking. 



1.2 The Cosmological Principle 

Despite the greater simplicity involved with the Heliocentric system, it would not 
come to dominate the astronomical community. This was, at least in part, not for 
scientific reason but religious prejudices. Many religions held the false belief that 
the Earth was somehow special and therefore must be the centre of our observable 
universe. The Roman Catholic Church had a strong hold on science and particularly 
astronomy (due to its close connection to the heavens). Any theories which did 
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not conform to the teachings of the Bible were deemed 'heretical' and were hidden 
away from public knowledge. Despite this dogma, Copernicus in the 16th century 
managed to garner much support for the Heliocentric model, mainly due to his 
scientific writings, De Revolutionibus (1543) and Galileo's supporting observations. 
Galileo later opposed the Catholic Church by his strong support for the Copernican 
ideas. While on trial for heresy he famously said of the Earth, "Eppur si muove" - 
and yet it moveJl]- 

So why are we dwelling in the past here? The reason is this; Copernicus and 
Galileo did not see themselves at the centre of the solar system. They had displaced 
everyone from a special location in space and this trend would not stop with the 
Earth. It would later lead to the whole solar system being placed in the outer rim 
of our Milky way galaxy. Then, in the early 20*^ century our galaxy became one of 
many. Now we are but a mere speck of dust in a vast, ever expanding universe, a 
fact that was first realised by the American astronomer Edwin Hubble in the 1920's. 



Edwin Hubble 

Edwin Hubble studied the systematic variations of the red-shift in the 'spiral nebu- 
lae', as they were known. The redshift z is defined as, 

.-^-1. (1.1) 

where Ao and A^; are the observed and emitted wavelengths of light, respectively. 

Hubble used this technique to investigate populations of similar objects, usually 

galaxies of a particular morphological type, and examined the relationship between 

the red-shifts and their relative brightnesses. What he found is now known as Hub- 

ble's Law: the red-shift in the spectra of the objects grow as the objects became 

^Galileo probably never spoke these exact words, however they stand as a symbol of his support 
for scientific truth. 
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more distant. The farther away an object, the faster it is receding from us. This, 
Hubble concluded, is because the universe itself is expanding - a fact that (as we will 
see in the next Chapter) was consistent with the theoretical predictions of Einstein's 
General Theory of Relativity 

In fact Hubble found this connection to be a linear relation between recessional 
velocity and distance. The usual Hubble law is written as. 



where v is the recessional velocity due to redshift, typically expressed in km/s. Hq 
is Hubble's constant and corresponds to the value of H (often termed the Hubble 
parameter which is a value that is time dependent) in the Friedmann equations 
(c/. eq J2.13p taken at the time of observation denoted by the subscript 0. This 
value is the same throughout the universe for a given conformal time. D is the 
proper distance that the light had travelled from the galaxy in the rest frame of the 
observer, measured in megaparsecs (Mpc). 

For relatively nearby galaxies [i.e. z <^ 1), the velocity v can be estimated from 
the galaxy's redshift z using the formula v = zc where c is the speed of light. 
For more distant galaxies, the relationship between recession velocity and distance 
becomes more complicated and requires general relativity (see Ch.2). 

In using Hubble's law to determine distances, only the velocity due to the expan- 
sion of the universe should be used. Since gravitationally interacting galaxies move 
relative to each other independent of the expansion of the universe, these relative 
velocities, called peculiar velocities, need to be accounted when applying Hubble's 
law. So more generally the Hubble law is. 



V = HoD, 




v.. 



rec 



HoL* + V, 
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In this case, Vp^c is the radial component of the pecuhar motion of the object. As an 
example the local group of galaxies has a Vpec ~ 600 km s~^ in the direction of the 
constellation Hydra. 

It is straightforward to show that the observation of Hubble's Law is consistent 
with what we would expect in a universe which is homogeneous {i.e. looks the same 
everywhere) and isotropic {i.e. looks the same in all directions). A number of mod- 
ern cosmological observations support the properties of homogeneity and isotropy, 
including the distribution of galaxies on large scales (which will be the main topic 
of this thesis) and the smoothness of the cosmic background radiation. Together, 
assumptions of homogeneity and isotropy are known as the Cosmological Principle 
(CP). 



1.3 Our view of the Universe 



If one is to understand anything about the large scale structure of the universe, 
it is generally advisable to know where the galaxies that make up that structure 
are. Mapping and understanding the spatial galaxy distribution is a prerequisite 
for constructing a viable model of structure formation in the universe. This effort 
reached its first step with the Abell, Zwicky & Lick catalogues, which eventually 
documented the angular positions of around a million galaxies. The second step 
then, was to expand the 2-D galaxy information by including the distances. This is 
distance to the galaxies via redshift surveys. 



Abell, Zwicky &; Lick 

Prior to the 80's, knowledge of the large scale structure of the universe was limited to 
only the angular distributions of galaxies, and a very uniform microwave background. 
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Figure 1.1: Lick galaxy survey, adapted from Peebles (1993) 

In the late 50's, G. Abell (1958) collected several thousand angular positions of galax- 
ies from the Palomar Sky Survey. This catalogue did not contain any information 
concerning the distance to the galaxies - it was essentially a projection of the true 
galaxy distribution onto a sphere. Then in the 60's Fritz Zwicky and collaborators 
visually scanned thousands of photographic plates from the same survey, obtaining 
positions of over 30,000 galaxies in the northern sky (Zwicky et al. 1968). After that, 
in the early 80 's, Schechtman accumulated a catalogue of 1 million galaxies in the 
northern sky from the Lick astrographic survey, see figure 11.11 

The Cambridge APM survey followed in the early 90's cataloguing about 2 million 
galaxies in the Southern Galactic Cap. Maddox et al. 1990. See figure [L2l 
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Figure 1.2: The APM galaxy survey 
Redshift Surveys 

Redshift measurements involve determining the spectrum of the object to be mea- 
sured. Once that is known, recognisable spectral lines can be found, and their de- 
viation from their normal positions used to find the object's redshift. The Hubble 
Law then allows one to turn that redshift into a radial distance from our galaxy. 

When redshifts were first being measured, it would typically take a few hours on 
a large telescope to collect enough photons to obtain the required spectrum. Once 
telescopes with enough light gathering power became available and spectroscopic 
detectors became sophisticated enough to allow many redshifts to be taken simulta- 
neously in a reasonable amount of time, astronomers started using these instruments 
to make maps of the 3 dimensional locations of galaxies and galaxy clusters. 

Figure 11.31 is a representation of some of the measured 3 dimensional galaxy 
positions in redshift space. The radial coordinate in this plot is the measured redshift 
(essentially indicating the distance from us) and the angular coordinates represent 
the angular position of the objects in the sky. Thus the 3-D mapping of the universe 



. Maddox et al. 1990. 



17 



CHAPTER 1. INTRODUCTION 



began: 

• In the late 70's red-shift surveys finally became a reality with the very successful 
CfA (Center for Astrophysics) survey (Huchra et al. 1983). They managed to 
record 1100 spectroscopic red-shifts. 

• In the early 90's the PSCz red-shift survey (Saunders et al. 2000) mapped 
15,000 spiral galaxies from ~ 83% of the sky. This still stands as the largest 
survey in terms of sky coverage. 

• From 1998 to 2003, the Two Degree Field Galaxy Redshift Survey, using the 
Anglo- Australian telescope, accumulated 220,000 galaxy red-shifts. See CoUess 
et al. 1999. This survey is illustrated in figure [L3l 

• The Sloan Digital Sky Survey (SDSS) is an ongoing attempt to collect 1 million 
galaxy red-shifts. The current release, DR5, contains 674,749 galaxies. See 
Percival et al. 2006 (DR5) and Abazajian et al. 2005 (DR4). 

1.3.1 Problems with this view 

The information directly available to us on the observed spatial distribution of galax- 
ies is systematically biased compared to the true galaxy distribution. 

Angular positions are trivial, but precise distance estimates are more difficult to 
obtain. More serious, however, is the problem that the sampling rate of observable 
galaxies depends strongly on redshift. If one considers the universe to be homoge- 
neously distributed with matter, then we would expect the number of objects, in 
some observed solid angle of the sky, to grow as r^. However this is not what we ac- 
tually observe, instead we see distributions like figure (15.51) . where the reference curve 
initially has the form of but soon drops off. This effect is due the diminished flux 
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Figure 1.3: 2dF galaxy redshift survey. 



of faraway objects and our inability to collect enough photons from them to make 
valid observations. This 'selection effect' can be visualised in figure (11. 3p . as the fall 
off in the apparent number density of sampled galaxies at larger redshift. This is 
accounted for by weighting each observed object with the inverse of the selection 
function, (f){r), this increases the contribution of counted objects. The functional 
form of the selection function is, 

^w-(^)^"(|^)^ (-) 

In the above expression ri,,a, f3 and tq are set parameters and r is the proper distance 
to the observed galaxy. 

Apart from the red-shifts, distances can also be estimated from the flux, Fy, 
received from a source. To accurately use this method, the intrinsic luminosity, 
Lv, of the source must be well known and that it radiates in a particular fashion, 
i.e. beamed or uniform. Considering the source to radiate uniformly, the distance 
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d can be calculated from, 

4 = (1.5) 



V 



The main problem with this distance estimator is that the vacuum of space is not 
so empty. It is filled with dust particles and gas, collectively termed the Interstellar 
Medium (ISM). The ISM can cause extinction of light (diminished flux) and leads to 
eq. (11.51) giving the wrong answer. The extinction of light is maximal at low galactic 
latitudes due to the dust content of our own galaxy. In fact at its worst this effect 
can completely block out the light coming from distant sources. 



In figure 15. 4[ the masked regions (black) are shown for the PSCz survey. The 
majority of it is due to the extinction of light through the galaxy. The sweeping arcs 
(north and south) are due to cryogenic problems in the satellite near the end of the 
survey period, it was not completed. In figure 14.11 some masked regions for the 2dF 
survey are shown. These masks are placed over the survey because local object may 
be blocking the view in a particular direction. 
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Chapter 2 

The Universe at Large 



Of the four fundamental forces of nature, the universe on large scales is governed by 
a single force: Gravity. In the proceeding section we will employ the use of Einstein's 
General Relativity to construct the main equations of cosmological evolution. Prom 
there we will relax our use of complicated tensor algebra and work in a Newtonian 
approximation to study the departures from the homogeneous Friedmann equations, 
due to the growth under gravity of tiny density inhomogeneities in the Universe. 
This is the 'first order' Universe. 



2.1 General Relativity 



In 1915 Einstein developed the theory of General Relativity to explain gravity as a 
consequence of the fundamental connection between the geometry and matter content 
of space-time. This can be summed up in the neat phrase 'Space-time tells matter 

how to move and matter tells space-time how to curve'. More specifically. Einstein 
developed a set of equations which balanced the curvature of space-time and the 
matter it contained. It is this curvature which generates the force of gravity and 
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determines how matter moves within it. Generally, this can be written as, 



SnGTik — Gik — Rik — 7:gikR — ^Qik- 



(2.1) 



To explain the ideas and physical motivation of the above equation is beyond the 
scope of this thesis (for further details see Misner et al. 1973 and Weinberg 1972 for 
a more physical description). In eg. (12. II) . Gik is the Einstein tensor, R^k and R are 
the Ricci tensor and scalar respectively, gik is the space-time metric and A is the 
Cosmo logical Constant. Although its inclusion in eg. (12. II) was described by Einstein 
to be his "Greatest blunder" , it is nowadays a very crucial parameter in cosmology. 
We will describe in more detail each of the above parameters as we go along. 

In our study of cosmology we need a theory of gravity, which we have, and a 
metric to describe the space-time. A metric is a function that measures the distance 
between events in space-time. Assuming that the CP is true, the form of this metric 
must conserve the CP as described in §1.2( i. e. it must be homogeneous and isotropic. 
To construct the metric with the CP in mind, we slice up the 4-D space-time along 
x*^ = const, hypersurfaces, i.e. constant time. 

To impose isotropy it is easier to tackle this problem in spherical coordinates since 
there is no preferred direction. For homogeneity we insist that the Ricci scalar is 
constant at all points on the hypersphere. And since we are dealing with a universe 
which is expanding it would be to our advantage to use a coordinate system which 
reflects this property. So we will work in a comoving system and therefore preserve 
the positions of galaxies relative to one another. Under the above assumptions it can 
then be shown (e.g. Carroll S.M, 2004) that the metric takes the following form. 



This is known as the Roberson- Walker metric (hereafter RWM). In the above ex- 



dr"^ 



+ r2(rf^2^sin2 ed^)'^) 



(2.2) 



1 — kr"^ 
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pression a{t) is the cosmic scale factor, is a constant which determines the spatial 
curvature and we are working in units where c= 1. k may take any real value but by 
suitable re-scaling of our coordinates it is only necessary to consider 3 possibilities. 



A; = 0, corresponds to zero curvature and thus a flat universe. Initially parallel 
trajectories remain parallel. 

A; = +1, has positive curvature and the universe is closed. Its geometry is like 
the surface of a sphere. Initially parallel trajectories will converge. 

A; = — 1, has negative curvature and leads to an open universe. Its geometry 
can be represented by the surface of a saddle. Initially parallel trajectories will 
diverge. 



The RWM in component notation is, 

/-I 

a'{t)/{l - kr^) 

Qap- ^ 

\ 
















a'^{t)r'^ sin^ 9) 



(2.3) 



With the above information about the metric we can go about obtaining the 
non-vanishing components of the Christoffel Symbols, F^^. The components are 
calculated via. 



(2.4) 



where a is a summation index and a comma denotes partial derivative. Now it is 
merely a case of turning the handle to derive the essential non-zero elements of the 
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Christoffel Symbols for the metric given by eq. (l2.2p . 



_ aa , kr 

J- 1 1 _ , r, ill 



11 l-kr^ 11 1-A;r2 

= aar^ Tgg = aar^ sin^ 9 
r^2 = r{kr^ - 1) = rikr^ - 1) sin^ 6 (2-5) 

Tgg = - sin 6* cos 9 Vl^ = cot 9 

pl _ p2 _ p3 _ ^ p2 _ p3 _ -'- 
01 ~ 02 ~ 03 ~ 12 ~ 13 ~ 

(Jj I 



From the above set of equations we can now determine the nonzero components of 
the Ricci tensor, Ra/3.Th.is is constructed by contraction of the Riemann tensor, R^^,^. 



Rafi = RaXfH- (2-6) 

Summation is imphed on the index A according to the Einstein summation conven- 
tion. The components of the Ricci tensor are therefore related to the Christoffel 
symbols, above by. 



In the above expression summation is implied over r and p. As an example this leads 
to, for the tt component: 

f? — -pr -pp _ -pr -pp j_ -pp _ -pp 
^tt — J- tt^ TP ^ tp^ Tt^ ^ U,p ^ tp,t 

= - r[ T^, + - 





a 








a 










a 
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The rest of the nonzero components are: 

aa + 2(2^ + 2k 

Rgg = r'^(aa + 2a^ + 2k), 
R<f><t> = r^(ad + 2d^ + 2k) sin^ 9. 

The Ricci scalar, R, is then obtained via the contraction of Rap i.e, 

R = g°'^Rai3- 



(2.9) 



(2.10) 



Therefore, 



R = g''Ru + g'-'Rrr + g'^Ree + g^^Ri 

2 

a 

-I 1- - 

-,2 



6 



a k 
+ - + 
a 



(2.11) 



Now looking back at equation (12.11) . we can compute everything on the RHS of this 
equation. To evaluate the LHS we must define our energy-momentum tensor, Tj^. 
This tensor describes the matter and energy content of the universe. A perfect fluid 
approximation is completely defined by two quantities: the rest frame energy density 
p and the isotropic rest frame pressure p. 



- ik 



(p 








o\ 





p 














p 





^0 








pj 



Now using eq. fl2.12p with eq. fl2.ip and assuming that A 
different equations. From the tt component we get, 

SttG k 



(2.12) 



0, we can obtain two 



a 



(2.13) 
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and from the other components we obtain 

A-nG 



a I 3 



(p + 3p). (2.14) 



These expressions are known as the Friedmann equations. H is the Hubble parameter 
and is defined as, 

H=-. (2.15) 

a 

Apart from the global expansion, we can now investigate other aspects of the RWM 
universe. The density parameter^ f2 is defined as, 

SttG _ p 

with pc being the critical density required to produce a fiat universe. Combining the 
above equation with fl2.13p gives, 

H^=^-^p.n--,=H'Q~-,, (2.17) 

and rearranging this gives, 

^-1 = ^- (2.18) 

The special case of ri = 1 implies that /c = 0, and since /c is a fixed constant it must 
be concluded that f2 = 1 for all time. 



2.2 Structure Formation 

There are two crucial observations that any model of structure formation has to 
explain: the quadrupole anisotropy of CMB, as measured by WMAP, is one part 
in 10^ (Hinshaw et al. 2003) suggesting that the amplitude of the fiuctuations was 
very small at the epoch of recombination; while redshift surveys of the Local universe 
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show highly inhomogeneous matter distributions over galactic and cluster scales. The 
gravitational instability is believed to be the physical mechanism which amplifies the 
small primeval fiuctuations into the structure that we observe today. 

We now have the main ingredients for analysing the evolution of structure, at 
least within a linear approximation, these are contained within equations fl2.13p - 
(12.151) . Combining the Friedmann equations and rearranging gives us the continuity 
equation, 

p=-3i/(p + p), (2.19) 

which shows the conservation of energy as the universe expands. To proceed we will 
also make the assumption that the matter content of the universe is well described 
by an ideal gas equation of state, i.e., 

p = wp. (2.20) 

In standard cosmology we assume that the majority of the matter is in the form 
of cold dark matter (CDM), which has the equation of state, w = 0. So the pres- 
sure component is assumed to be negligible which greatly simplifies the following 
calculations. 

In eq. (12.21) the effect of the curvature is small for distances much less than the 
Hubble radius cH^^ = 3000 h^^Mpc (a variety of observations clearly favour Qtot ~ 1 
and ~ 0.7 therefore \k\ < Hq). Hence, the RWM is well approximated by ds"^ = 
cdt^ — (t) {dx^ + dy"^ + dz^) (similar to the flat Minkowski metric of special relativity 
but with a time varying rescaling.), where {x,y,z) denote the comoving Cartesian 
coordinates. Assuming a conformal Newtonian gaugqj (c/. Ma & Bertschinger 1995) 
the Einstein field equations applied to the first-order perturbations of such a metric 



""^Also known as the longitudinal gauge. This transform is very useful when considering scalar 
perturbations. 



27 



CHAPTER 2. THE UNIVERSE AT LARGE 



yield the Poisson equation of Newtonian gravity: 

= 47ra^G(5p, (2.21) 

where 5p = p(x, t)—p{t) indicates the fluctuation of the mass density about the mean 
density p{t) and (p is interpreted as the Newtonian potential. Note that eg. (12.2 ip 
does not assume that 5p is small. 



2.2.1 The Eulerian Formalism 

The Eulerian Formalism considers the large scale universe as a continually expanding 
fluid, whereby momentum, energy and mass conservations are encapsulated within 
the equations below: 

a(5 + V ■ [(1 + 5)v] = (2.22) 

av + (v ■ V) V + dv = -V0 - p" Vp, (2.23) 

= ^T^a^G5p, (2.24) 

Equations (I2.22p - fl2.24l) are the equations of motion of a non-relativistic perfect fluid 
in comoving coordinates. Eq. (12.220 is the continuity equation (expressing mass con- 
tinuity) and eq. (12.231) is the Euler equation (conservation of the linear momentum). 
In this system of differential equations the over-density field, (5(x, t), appears rather 
than the usual density field, 

p(x,t)=p(t)[l + (5(x,t)], 

with p being the spatially averaged mean density and the peculiar velocity, v(x, t), 
is defined as 

V = - - -r. (2.25) 
at a 
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Over-dots indicate partial time derivatives. The pressure p is related to the density 
p through equation (12.201) . For adiabatic perturbations there are no spatial variations 
in the equation of state, therefore Vp = wV p = wpS/5. 

The Linear regime of Adiabatic perturbations 

Linearising our system of equation, we obtain 

a(5 + V-v^0, (2.26) 

av + av ^ -V0 - c^V5, (2.27) 

= ^T^Ga^Sp. (2.28) 

Where represents the adiabatic sound speed, = {dp/dp)s = w, and the subscript 
S indicates constant entropy throughout the space (VS = 0). 

A general vector field may be decomposed into a (potential) longitudinal and a 
(rotational) transversal part: 

v(x,t) = V|| + vx, Vxv||=V-vx = 0, (2.29) 

From the curl of eq. (I2.27P it follows that 

(aV X v) = — (aV x v^) = 0. (2.30) 

dt^ ' dt^ ' ^ ^ 

This implies that rotational modes are not coupled to density perturbations and 
decay as a"^. Combining the time derivative of the linearised continuity with the 
divergence of the linearised Euler (c/. eq. 12.271) we yield the equation of motion for 
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the longitudinal density perturbations 

5 + 2-5 = f-) ^ + A-nGdp. (2.31) 
a \ a / 

Since the coefficients are spatially homogeneous (independent of x) this equation may 
be solved by expanding 5(x, t) in plane waves, 5(x, t) = Sk(t)e^^'^, X = 2TTa(t)/k, 
where A is the proper wavelength. After some straightforward calculations, it is 
easy to show that the dynamical behaviour of 6k{t) obeys the following differential 
equation: 

k + 2-6^ = -cl{e-k])6^, (2.32) 
where we have defined the comoving Jeans wavenumber kj by 

..^.(^f)". (.33) 

Two qualitative behaviours of the solutions can be easily discerned from eq. (12.321) . 
For wavenumbers larger than kj pressure dominates the right hand term and per- 
turbations do not grow, merely oscillate. For k < kj self-gravity dominates so that 
gravitational instability can take place. Exact solutions to eq. (12.321) exist for a vari- 
ety of cases (see, for instance, Peebles 1980). Since the dynamical behaviour of 6k{t) 
is governed by a second order differential equation, in general, there is one monotoni- 
cally growing solution and one monotonically decaying solution. In the limit k kj 
the effects of the pressure p are negligible and thus all modes grow at the same rate. 
In this regime, the general solution to eq. (I2.3ip is given 

6{t, x) = A{^)D^ (t) + B{^)D_ (t) ^ A{x)D^ (t) (2.34) 

where D^{t) and D_{t) are the growing and decaying modes, respectively, while 
y4(x) and -B(x) are time independent functions (Heath 1977). The decaying solution 
is a perturbation with initial over-density and peculiar velocity arranged so its initial 
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velocity quickly becomes negligible (Peebles 1980). Thus for most of the history 
of the universe the growing solution quickly comes to dominate. In an Einstein-de 
Sitter universe D^{t) oc a and D {t) oc a~^/^. For a dust universe with <1, the 
growing mode is 

^ 3 sinhr7(sinhr7 — ??) 

[t) = —-^ ^ - 2. (2.35 

(cosh?7 — 7])'^ 

7] indicates conformal time r] = (— A;)^/^ dt' /a(t'). 

Given a solution for the density perturbation field (5(x, t), the velocity, gravi- 
tational potential and gravity field follow. For the longitudinal modes v = vy = 
— V0i,/a. The gravity field is g = — V0/a. Thus, from the system of differential 
equations, eq's. fl2.26l) - fl2.28l) . for k kj, we obtain 

' d^x\ 0=-_^0^, g= V. (2.36) 



47r y |x'-x| ' ^ 2 / 2 / 

Where / is defined by 

finz)^'^-^^-^^ (2 37) 

The behaviour of /(fi, z) at the present epoch {z=0) is very well described by / ~ Q^'^ 
in the case of universes with negligible space curvature or rather small cosmological 
constant (Peebles 1980). 



2.3 A^-body Simulations 

So far we have only been concerned with the linear regime. To tackle the non-linear 
regime we must leave our analytical tools behind and instead buy a very big computer 
to run A^-body simulations. This is exactly what people have been doing since Sverre 
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Aarseth wrote the first astrophysical A^-body codes (Aarseth, 1978 )□. 

In the early days of this field, cosmological simulations could only handle roughly 
1,000 particles (Goth & Turner, 1979). But with the exponential increase in comput- 
ing power it was not long until very large A^-body simulation were being carried out. 
By the mid-80's there were 30,000 particles being used (Efstathiou, 1985), this rose 
steeply to 10® in the 90's (Bertschinger & Gelb, 1991) and now the most recent effort 
by the "Virgo Consortium" has seen the first billion particle cosmological simulation 
(Evrard et al. 2002). 

Over the last two decades cosmological A^-body simulations have played a crucial 
role in the study of the formation and evolution of cosmic structure. Primarily they 
have been used to match theory with observations. However, simulations like the 
recent "millennium run" (Springel et al. 2005) use only dark matter particles which 
form the gravitational potentials for structure growth. Therefore, there is a crucial 
step to go from dark matter particles to dark matter halos and finally individual 
galaxies. The basic technique for doing so is discussed in §5.3.11 One would hope 
that in this process we are not masking some underlying physics. 

The impressive progress achieved in the observational front with the completion 
of very large surveys such as 2dF and SDSS poses a clear challenge to the numerical 
work in cosmology: the precision of the predictions provided by the current A^-body 
experiments have to be of the order of a few percent. 

A^-body simulations, however, do not include all relevant physics like magnetic 
fields and gas dynamics. They also have resolution issues, which must play a roll on 
small scales. So while modern simulations are undoubtedly very useful tools, they 
must be used with care. 



^Aarseth helped greatly in the early days of A^-body simulations by making his codes readily 
available and easily adaptable. 
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Chapter 3 

Quantifying The Large Scale 
Structure 

The issue of quantifying structure is not confined to cosmology, it is a complex matter 
which stretches across many areas of science. Patterns are there to be identified and 
exploited in order to find subtle connections and correlations between obscrvables. 
This method of analysis can be seen in fields as diverse as studying the stock market 
and modelling biological systems. A wide variety of different statistical tools have 
been employed to quantify structure, but in studies of the large-scale distribution of 
galaxies perhaps the most common has been the 2-point correlation function 

3.1 The Two Point Correlation Function: ^ 

The two point correlation function (hereafter 2PCF), {(r), is defined as the excess 
probability, with respect to a Poisson distribution, of finding two galaxies in volumes 
dVi and dV2 separated by a distance r. The joint probability is then, 

dPi,2^n''[l + ^(r)]dVidV2, (3.1) 
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where n is the average density of the sample. This is however not a straight forward 
working definition since we know that density is varying through the space. Con- 
sequently it is not straightforward to identify how large a volume of space must be 
sampled in order to reliably measure the average density of the galaxy population. 
Equally importantly, even if such a 'fair sample' volume can be identified, the num- 
ber count of observable galaxies within that volume will not generally be a reliable 
estimate of the 'true' number of galaxies in the volume because of observational se- 
lection effects - i.e. a radial selection function and angular masked regions. Also, 
measuring accurately the distance of galaxies in a survey is not straightforward: red- 
shifts are distorted by peculiar motions and redshift-independent distance indicators 
{e.g. using some form of 'standard candle' assumption) are rather noisy. 

For this work we proceed to measure the mean density n as the number of galaxies 
inside the survey, each weighted by the selection function, 0(rj); in practice we 
perform a Monte Carlo over the sample space (sec Strauss & Willick, 1995 and 
references therein). We can also write the conditional probability, dPc, of finding a 
galaxy in a volume dVi, at a distance r from another galaxy, 

dP,^n[l + ^{r)]dV^. (3.2) 

Prom this equation it is straightforward to see the properties of the function ^(r). 
For ^(r) = 0, we recover a uniformly random point set, such that the probability 
of finding a galaxy in volume dV is simply proportional to dV . The case of excess 
clustering is ^(r) > 0, where we have more galaxies than in the Poisson case. Then 
there is also the case where ^(r) < 0. This case corresponds to anti-clustering, and 
could be relevant e.g. to some models of galaxy formation where the formation of a 
galaxy may inhibit the formation of other galaxies in its vicinity. 

In general the observed ^(r) is well described by a power law scaling with distance 
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and it is standard practice in the literature to write the 2PCF as, 



e(r) = (^)'. (3.3) 

Here tq is a characteristic scale length usually evaluated when = 1. This description 
is however an over simplification as the physics of structure formation is richer in 
complexity than a simple power law. 

One great advantage of using this estimator, is that the Fourier Transform of 
^(r) gives the Power Spectra, V{k), of density anisotropics (see §3.1.4^ . It is also 
a statistic which is very easy to visualise and very easy to compute. From ^(r) it 
is also possible to determine the correlation dimension, D2-, of the discrete point 
distribution. D2 is calculated via, 

= 3 + (3.4) 

d[log(r)J 

For a homogeneous distribution in 3-D we would expect D2 However, at scales 
around tq, 7 ~ 1.77 (Davis-Peebles, 1983), corresponding to a dimensional value, 
D2 ~ 1.23. We will soon see where expression fl3.4p comes from when we discuss 
generalised dimensions in §3.41 



3.1.1 ^-Correlation Estimators 

There are a few different ways to measure ^(r), in the literature but the basic compu- 
tational structure is more or less the same. The main differences between estimators 
are usually the way in which they deal with 'edge effects' and the 'shot noise'. We 
are obviously looking for correlations between galaxies so we begin by centering on 
a galaxy, and proceed to count the number of galaxies within spherical shells of dif- 
ferent radii around the central object. This procedure is repeated by centering on 
all, or a randomly chosen subset of all, the galaxies in the catalogue to obtain a 
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statistical average. The number of galaxies counted in each cell is then normalised 
by a Poisson term which is related to the mean density. This estimator is written as, 

, , dPc DD , , 

^ Q(kj - ri\ - r).9(r + dr ~ \rj - r,;|) 

Y n{r).dV{r).(j){ri).(f){rj) ' ^ ' 

where DD are data-data pairs, DR are data-random pairs and N is the number of 
galaxies, (p is the selection function and 



, 0, a; < 0, 

eix) = { '. (3.7) 

1, X > 0, 



The problem with eg. (13.61) lies within the mean density term. To obtain n, the most 
accurate method should be to sum over all the galaxies while weighting each by the 
inverse of the selection function. This is calculated as follows, 



N 

i ' 

n 



1 ^ 

TjT.^i'inr'. (3.8) 



V 

In the last equation, V is the volume and the sum is over all galaxies. To reduce 
the variance associated with ^(r), the average can be taken, 

1 ^ 

(^i^)) = NJ2^Ar)- (3.9) 

j 

The angled brackets represent a statistical average, which usually invokes the use of 
the cosmological ergodic theorem (see §3.1.2p . The result of applying eg. (13.61) to a 
mock galaxy catalogue can be seen in figure 13.11 The red line is a straight line fit to 
the data, giving 7 1.6. 

So far we have used DD and DR, however RR pairs are also useful in calculating 
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Figure 3.1: The 2PCF measured using ( I3.6P and applied to the PSCz mock galaxy 
catalogue. The gradient gives 7 ~ 1.6 and therefore D2 ~ 1.4 on scales up to SOMpc 
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the 2PCF. Other estimators hke the minimum variance estimator, by Landy & Szalay 
(1993), use the RR pairs to help correct for boundary effects. Their estimator takes 
the form, 

M(M-1)DD (M-l)DR , , 

^ r = — ^ r ^ + 1, 3.10 

' N{N-1) RR N RR ' ^ ^ 

where the M is the number of random points. There are many more estimators for 
the 2PCF. Kerscher, Szapudi, & Szalay (2000) show that equation (13.101) is strongly 
preferred over other methods. 



3.1.2 Ergodicity 

A short note on the Ergodic Principle: 

The observed universe is unique. This implies that averages have to be spatial 
ones. Such averages will be equal to those obtained if instead we were to average 
over an ensemble of universes if the Cosmic Ergodic Theorem holds. Ergodicity in 
the cosmological context means that ensemble averaging and spatial averaging are 
equivalent. Note that, in contrast with the common practice in statistical mechanics, 
the cosmological Ergodic Hypothesis refers to the spatial distribution of a random 
field at a fixed time rather than to the time evolution of the system. Thus, for 
instance, the ensemble average of the random field 5(x) at a point x, (5(x)), is 
simply the expectation value of the random variable S{x.). 



3.1.3 Gaussianity 

Let us introduce some of the statistics used by cosmologists to characterise the spatial 
distribution of matter. We define r.m.s. crs fiuctuations of a continuous density field 
5(x) as 

a| = (<5(x)2), (3.11) 
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and the correlation function by 



e(ri2) = (5(xi)5(x2)) 



(3.12) 



(Note that for a homogeneous and isotropic random process ^ only depends on the 
distance between the two points ri2 = |xi — X2I.) The correlation function is a 
measure of the spatial correlation of the field 5(x). 

A random field is said to be Gaussian if all A'"-point multivariate probability 
distribution functions are multivariate Gaussian distributions defined by their mean 
vector (5(xj)) (which the ergodicity implies to be identically zero) and their covari- 
ance matrix Mij = ^(xj,Xj). Gaussianity is a very popular assumption for two 
reasons. The first one is that the calculations are "easy" to perform. The second 
reason is that the CMB seems to support a Gaussian initial density field, Komatsu 
et al. 2003. 

3.1.4 Power spectrum 

If we expand the 5 (x) field in plane waves as 



where Vu may be thought of a "fair sample" of the universe. The power spectrum 
of the density field 5(x) is defined as the expectation of the two-point function in 




(3.13) 



we see that its Fourier transform 5k is given by 




(3.14) 
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Fourier space, as follows: 




) 



P(A;i)5D(ki - k2) 



(3.15) 



where is the well-known Dirac delta function. This implies that even if 5^ is not a 
Gaussian distribution, the random variable 5(x), being an infinite sum of independent 
random variables, will still be Gaussian by the Central Limit Theorem for some well- 
behaved power spectra. We can see that the Dirac function in eq. fl3.15p is required 
because of the translational invariance, (5(xi)(5(x2)) = ^(|xi — X2I). Similarly, we 
can also see that isotropy implies that P{k) depends only the magnitude of the 
wave-vector k. 

3.1.5 Window Functions 

For some calculations it may be necessary to apply a cutoff at high spatial frequencies, 
this is due to non-linearities on small scales. The smoothed field (5(x) that may be 
obtained by convolution of the "raw" field with some weighting function W (called 
window function) having a characteristic scale r\Y is given by 



Where W^(k) is the representation in Fourier space of W{y ,rw)- The window func- 
tion has the following properties: H^(x' — x, r^y) = const. ~ if |x — x'| ^ rw^ 
iy(x' — X, ry[/) = if |x — x'l ^ Tw, satisfying the relation J iy(x' — x, rw)dy = 1. 
One of the most common window functions is the ^^top haf (TH) window function 




(3.16) 



has r.m.s. fluctuations given by 




(3.17) 
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which is defined by the relation 



3 |x-x'| 



Wth{\^ - A.tth) = J-^m - (3.18) 

where H denotes the Heaviside step function {H{y) = if y < 0, and H{y) = 1 if 
y > 0). Another commonly used window function is the Gaussian kernel: 



1 / |x - X' 



Wg{\^ - x'l; re) = exp ( - ' ' ) . (3.19) 



(27rr|)2/3 ^^^^ V 2r, 



J\3 



G 



3.2 Higher Order Correlations 



It is a natural question then to ask whether there are higher order correlations than 
the simple 2PCF, generally defined as, ^(r). The answer is most definitely YES. 
A Gaussian random field would in principle be completely defined by the 2PCF 
(the initial density field is thought to have this property), however due to non-linear 
structure formation we now have local non-gaussianity which means we must look 
to higher order moments to accurately quantify the galaxy distribution. 

These higher order correlations are defined as, ^„(ri, r„), where n is the order 
of the correlation function. As an example the 3 point correlation function is defined 
as the joint probability of there being a galaxy in volume elements dVi and dV2 given 
that these elements have displacements ri and r2 from the galaxy which is being 
investigated. This is illustrated in fig.f l3.2p . The joint probability can be written as, 

dP = n^[l + ^{ra)+^{n)+^ir,) + C{ra,n,r,)]dVidV2dV3, (3.20) 

where ra,rb and Vc are the sides of the triangle. Assuming homogeneity and isotropy 
means leads to ( being a symmetric function of the three lengths. Equation fl3.20p 
is the full three-point correlation function, and ( is known as the reduced part. 
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Figure 3.2: The distances ri r2 and separated tlic tlircc galaxies. Tlie triangular 
configuration can be fixed at the outset or it could be included as a variable at the cost of 
much more computation. 



The conditional probability of finding two objects to complete the triangular 
configuration given that we are centring on a galaxy is, 

dP = + e(r„) + i{n) + e(re) + CK. T,)]dV2dV^. (3.21) 

Then the conditional probability of finding a galaxy to complete the triangle, given 
that we have a pair of galaxies with separation r^, is, 

^P^,\^ + i(r^) + (.(n) + i(n) + ar^,n,n)]^^^^ (3.22) 

In extending to N-point correlation functions the computational load is increased 
exponentially. This limits our ability to determine accurately N > A, however there 
may be a way to avoid this problem by making some approximations. 

Given that ^(r) can be represented by a power law, 

C(r) = Br'^, 7 ~ 1.77, (3.23) 
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the 3PCF is then found to be well described by a combination of ^'s (Peebles, 1980), 
i.e., 

C(ra,rb,r,) = g[^(r„)e(r,) + ^(n)e(r,) + ^^^(re)], (3.24) 
with g ~ 1.0 ± 0.2 (Meiksin, Szapudi & Szalay, 1992). 

3.3 Minimal Spanning Trees 

One draw back with the 2PCF, as we have defined it in eg. (13. II) . is that it is insensitive 
to filamentary structure. This is due to it being a function only of distance and not 
direction; thus all angular information is lost through the averaging. The Universe 
does appear to contain filaments, walls and other such features, but whether these 
are real or due to chance alignments has of course been a topic of debate over recent 
decades. 

To quantify this kind of structure Barrow, Bhavsar and Sonoda (1985) introduced 
a method from Graph Theory, called Minimal Spanning Trees (MST). The procedure 
to implement this is as follows: 

1 A galaxy A is chosen as a starting point within a 3-D galaxy distribution. 

2 The nearest galaxy to A is labelled B and a straight line is drawn to 
connect the two. This line is known path. 

3 The closest galaxy to the set of previous galaxies (in this case A and B) 
is added to the set and is connected to the closest galaxy in the set. This 
produces a branching behaviour, and this step is repeated. 

4 After some time we are left with many paths and circuits (closed paths) . 
If there are no circuits the graph is open and this is known as a tree. 

5 To then transform this abstract visualisation into a numerical represen- 
tation of structure we can do a few things, e.g.: 
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Figure 3.3: On the left hand panel we see a distribution of points. As we apply the method 
from 1-5 the Minimal Spanning Tree is constructed on the right hand panel. 



i Calculate the number of lines in angular bins of dO w.r.t an 
adjoining line. 

ii Calculate the number of lines in distance bins, dr. Obtaining 
dN vs dr. 

This method connects all galaxies in some fashion. But we know that not all 
galaxies are physically connected, as a galaxy in one cluster has little to do with 
another galaxy in a distant cluster. 

To account for this, we adjust the previous method by only joining a galaxy to 
a pre-existing tree if its distance to the closest member is less than some threshold 
distance. This technique is known as separating and was introduced by Clark & 
Miller (1966). This method was recently applied to the SDSS DRl by Doroshkevich 
et aL(2004). They found that groups and clusters are more hkely to be found close 
to walls rather than filamentary structure. 



44 



CHAPTER 3. QUANTIFYING THE LARGE SCALE STRUCTURE 



3.4 The Fractal Universe 

Fractal patterns can be thought of as the place where chaos and order meet. This 
is because self-similarity (fractality) seems to be an eventual by-product of chaotic 
systems. Fractal pictures are usually associated with "Julia" and "Mandelbrot" sets 
named after the French mathematicians. 




Figure 3.4: Left: The Julia set. Right: The Mandelbrot set. In both the Julia & the 
Mandelbrot sets the self-similarity is clearly apparent i.e. successive enlargements of areas 
will show similar patterns to the picture as a whole. 

Fractals belong to a branch of mathematics known as Fractal Geometry. Unlike 
the usual definition of Geometry, where regular shapes and patterns are studied, frac- 
tals can be highly irregular and the dimensions which are explored are not confined 
to integer values like D = 2 for surfaces and D = 3 for volumes. The term fractal 
was originally coined by Mandelbrot himself, who is widely considered the father 
of fractal studies. In fig. fl3.4p the Mandelbrot and Julia sets are shown. These are 
classical examples of fractal patterns, we can easily see that zooming in on certain 
areas will yield patterns which are reproductions of the whole picture. It is this self- 
similarity in fractal patterns which make them scale-invariant objects. Multifractals 
on the other hand are not scale invariant, since their spatial dimension can vary with 
scale length. In these terms monofractals can be thought of as a special case of a 
more general multifractal. 

Fractals were merged with the physical sciences through their intimate connection 
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to nonlinear physics and chaotic dynamics. In configuration space we may have 
chaotic motion for an unstable system, however, in phase space dynamics may be 
more ordered and the system may evolve towards an attractor. This is rather like 
a chaotic analogy to an equilibrium state. Phase space trajectories are, however, by 
no means an easy-to-use tool for analysing structures since, in cosmology, we only 
have access to spatial coordinateJl]. Nevertheless, this kind of ordered phase space 
gives rise to a self similar pattern in real space. This is, more specifically, a fractal 
pattern. 

Now it is known that on large scales, the main ingredient to structure formation 
is the force of gravity. However the 1/r potential for gravity leads to highly nonlinear 
motions and also to cross talking between different spatial scales. So it is not so great 
a jump to consider the galaxy distribution as a fractal of some kind. In fact this line 
of analysis is not a new one. The distribution of galaxies in the universe has already 
been shown to be well described using a multifractal framework, see e.g. Jones et 
al. 1988. 

3.4.1 Multifractal Formalism 

In this analysis we will adopt the procedure layed out in Henschel et al. (1983) to 
determine the Renyi (Generalised) dimensions of a point set embedded in a three- 
dimensional Euclidean space. The probability of a galaxy, j, being within a sphere 
of radius r centred on galaxy i is. 




1 ^ 



(3.25) 



Ti — rj\ — r). 



^This is not entirely true as we do have limited velocity information as well. 
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Here nj(r) is the number of galaxies within radius r, iV is the total number of galaxies 
and 

{1, X < 0, 
(3.26) 
0, X > 0, 

Equation (13.251) can then be related to the partition sum via Grassberger and Pro- 
caccia's (1983) correlation algorithm, 

M 

^(^'^) = ME[P'Wr'ocr^(^). (3.27) 
1=1 

In this case M is the number of counting spheres and q defines the generalised 
dimension we are investigating. r(g) is the scaling exponent, which is then related 
to the infinite set of dimensions through. 



q-r 



D, = r^, q ^ 1. (3.28) 



Clearly the special case of g = 1, the information dimension, cannot be determined 
using the above expression but can be found approximately in the limit g — 1. 
This is an important dimension to calculate as it gives equal weighting to voids and 
clusters. Voids are enhanced for g < 1 and clusters are enhanced for g > 1, so 
g = 1 is in this sense the most unbiased dimension in the set. To determine Di more 
accurately we must calculate. 



1 ^ 

S{r) = —J2^og\p,{r)]^r''\ (3.29) 



i=l 



where S{r) is the partition entropy of the point set. 

In §3.11 the 2PCF was related to the correlation dimension, D2, through a differ- 
ential operation. However, in Multifractal terms the correlation dimension is only 
one of an infinite number of generalised dimensions which we can use to characterise 
the distribution. Other important dimensions include, Di - information dimension. 
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Figure 3.5: Applying the multifractal analysis to ACDM halo catalogue, we obtain the 
usual Dq curve. A linear fit to the partition sum, Z{q,r), over the two distance scales 
produces each point in the plots above. The catalogue has approximately 1.5 million halo 
positions. 

Dq - Capacity dimension, and as q —>■ ±oo. Multifractal distributions are usually 
defined by a Dq curve which generally decreases with q (see figure [33]) . Monofractals 
on the other hand produce a flat Dq curve. 

The methodology we have constructed is used to calculate the Dq curve for the 
ACDM A^-body simulations. The two plots in flgure 13.51 show Dq curves for different 
distance scales. A minimum approach was applied to the partition sum, Z{q,r) 
as illustrated in flgure IST^ a). 

The minimisation was performed on a straight line model flt to the data at 
two different distance scales; 10 < i?i < 40 Mpc and 50 < i?2 < 100 Mpc. Both Ri 
and i?2 were chosen as they appeared to have constant gradient in these regions. At 
small scales the data seems to be supporting a multifractal distribution, whereas on 
large scales the universe appears to reach homogeneity on scales considerably smaller 
than the size of the box. 
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3.4.2 Other Estimators 



The partition sum can also be calculated using a counts-in-cells approach (Mandel- 
brot, 1982). Define a new measure of a discrete point set as, 

= (3.30) 

which is just the fraction of all galaxies Ntot contained within cell i. Also if the 
relation /ij = 1 is satisfied, then it must follow that the cells cover the entire 
space (of topological dimension, d) and that the cubes have volume r'^. We can now 
construct a new measure, 



^ I 0, a>r(g), 

M{q, r) = J2 l^W = N{q, r).r^ <j ' , (3.31) 

oo, a < T{q), 



r ^ 

1=1 



with N{r) being the number of occupied cells and N{q, r) is the number of occupied 
cells, weighted by q. The measure is dominated, for large positive values of q when 
the cells are more populated, and for large negative q when the cells are sparsely 
occupied. This is a very important property of multifractal analysis, since under- 
dense and over-dense regions are probed by different values of q. 

In the limit a '^{q), it follows that M will be finite and nonzero. Then we can 
see from eg. ( 13.31 1) that, 

N{r) 

N{q, r) = J2 1^1 (3-32) 

1=1 

and 

r(,) = l„„i!ii^. (3.33) 
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r(Mpc) r(Mpc) 



Figure 3.6: Results of the mock PSCz catalogue from a ACDM model. Left: The parti- 
tion sum Z2 varying with distance. This is very closely related to the 2PCF. The direct 
derivative of this plot can determine the correlation dimension varying with scale. [Figures 
from Pan k Coles, 2001] 

3.4.3 The f{a) Curve 



Grassberger et al. (1988) show that we can rewrite the usual fractal measure as, 

Pi ~ r"* (3.34) 



The distribution of the scaling indices ai characterise the dimensionality of the sur- 
vey. This is evaluated using the ct-spectrum, 

n{a)da - N\ inr|i/V'*--^(")<^'^, (3.35) 

where n{a)da is the number of times that a takes values in the range (a, a + da). 
For a homogeneous fractal distribution the f{a) curve reduces to a single point: 
Qio — f{(^o) — Dq. In any case the statistical properties of a distribution are equally 
described by either the generalised dimensions, Dq or by the f{a) curve since they 
are a Legendre pair, as shown below. The only main drawback, as we will soon see, 
is that the latter strategy requires an extra differentiation of the data. 
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We can show this by considering the integral version of equation (13.271) , 

i=i 

= J |lnr|i/V^-^(")rfa. (3.36) 

A solution to the above expression can be found using the Laplace integral approxi- 
mation (Martinez et al. 1990), giving, 



1/2 

= . (3.37) 



The conditions of this theorem defining the function a{q) are, 

d /(«') I 
da I"' 



(3.38) 



and 

d''f{a) 



< 0. (3.39) 



da' 

Using Z{q,r) = const x r"^*^^^ and ( 13.37^ we get, 

r(g) = «(g)g -/(«), (3.40) 
and using fl3.38p with fl3.39p leads to, 

dr 

a{q) = -. (3.41) 
dq 

Equations fl3.40p and fl3.4ip relate the variable pairs, (g, t) and (a,/): a Legendre 
transform. So we can see that the distribution is equivalently characterised using 
either method. However, in practical terms the Generalised Dimension approach may 
prove to be more accurate, since the f{a) curves require a further differentiation of 
the data. i.e. /(a) curves (fig l3.7p are related to the derivative of the Dg's (fig j3.5p . 
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; ■JM''!'. . "■■ .3.:.. '■•A- .'S ■" 





Figure 3.7: Two fractal distributions with their corresponding /(a) curves below. 
left: A monofractal distribution produces a single point in the /(a) — a plane, right: 
A genuine multifractal point set produces a curve in the /(a) — a plane. [Figures 
from Jones et al. 1988] 

through eq. fl3.4ip 



In figure 13.71 two realisations of a multiplicative random fractal are shown with 
their corresponding /(a) curves. The method for constructing these distributions is 
discussed in Jones et al. (1988). 
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Chapter 4 

Boundary Corrections 



As we touched on in §1.3.11 the analysis of redshift surveys is prone to many problems, 
some subtle and some not so subtle. The presence of large 'holes' in a survey - regions 
where no galaxies have been observed - is clearly an example of a serious problem 
which needs to be appropriately corrected for. 

To analyse real (or indeed mock) galaxy surveys we must, therefore, deal with 
the practical issue of incomplete sky coverage. This can arise firstly because of the 
geometry of the survey, which is usually a thin beam or a fan. Figure 14.11 illustrates 
the latter case for the recent example of the 2dF galaxy redshift survey (2dFGRS, 
CoUess et al. 1999). The figure shows the projected distribution of galaxies on the 
plane of the sky, from which we see that the coverage of the survey is not all-sky 
- i.e. we do not sample galaxies over 47r steradians. Also, we notice that there are 
small patches and strips within the geometrical area of the survey which were not 
sampled. 

Secondly, incomplete sky coverage can be caused by the extinction of light through 
parts of our own galaxy, or regions being obscured by local objects. Hence, for 
example, even redshift surveys such as the IRAS PSCz (Saunders et al. 2000) which 
set out to be all-sky are missing galaxies within a 'mask' close to the plane of the 
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Figure 4.1: The 2dFGRS southern region shown in projection. The colour scale gives the 
varying magnitude limit from plate to plate, white denotes regions which lie outside the 
survey. [Figure from the website: 2dF www.mso.anu.edu.au/2dFGRS] 

Milky Way galaxy. 

Thirdly, of course, as we already discussed in §1.3.11 redshift surveys are affected 
by radial incompleteness, which we describe in terms of a selection function, caused 
by the flux limit below which distance galaxies are too faint to be observed. We 
can see the influence of this flux limit in Figure 14.11 which shows the variation in 
magnitude limit from plate to plate in the 2dFGRS. 

So, in summary, information about the 'true' population of galaxies is hidden or 
distorted by the effect of a flux limit, by the presence of masked regions and by the 
boundary of the survey itself. 

There has been a number of methods created to account for the problems men- 
tioned above. In the following sections a few of these methods will be reviewed and in 
§4.61 a new correction technique is introduced, whose benefits include computational 
and statistical efficiency. 

4.1 Deflation Method 

As we consider placing spherical shells of increasing radius around a galaxy, even 
allowing for our 'weighting' of the number count of galaxies in each sphere by the 
radial selection function of the survey, it is clear that we will eventually reach the 
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a) 



b) 




Figure 4.2: Panel (a) illustrates the state of concentric shells around a galaxy in the 
survey. As the radius of the shell increases, eventually density is measured for shells that 
are partially external to the survey, such as the shaded shell above. The lower panel (b) 
demonstrates the maximum distance to which this survey can be probed: the radius of 
the largest sphere which can totally be contained within the survey. [Figure from Hatton, 
1999] 

edge of the survey (which of course we can think of as the distance beyond which the 
selection function is equal to zero). Therefore, unless some form of edge correction is 
applied, further increases in the shell radius will result in an estimate of the density 
for shells that are systematically underpopulated relative to the mean density of the 
underlying galaxy population, because of the volume of each shell that lies outside 
the survey. This effect is illustrated in the upper panel of Figure 14. 2[ For the outer 
(shaded) shell in this diagram, the estimated density will be systematically lower 
than the true density since the shell includes a region that lies entirely outside the 
survey volume and so, by definition, will contain no galaxies. 

The deflation method is, perhaps, the simplest and crudest form of boundary cor- 
rection. It simply restricts the sum in equation (13.251) to include only those counting 
spheres which lie completely within the survey region. However, this drastically re- 
duces the distance out to which the density estimator can reliably probe, leading 
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0: 



effectively to a 'cosmic varianceu problem at larger radii. 

The maximum scale, Rgurv which is probed using the deflation method is defined 
as (Hatton, 1999), 

-tisurv : ^ ; \^-^) 

i + Sm "surv 

where 6 surv is the opening angle of the survey and d the volume limit. 

As an example, the Stromlo-APM redshift survey (Loveday et al. 1992) had an 
opening angle, Osurv = 22.6° and a volume limit, d ~ 110/i~^Mpc, this leads to a 
maximum sampling radius of only, Rsurv = 30.5/i^^Mpc. 



4.2 Capacity Correction 

The capacity correction can be thought of as the next step up from the deflation 
method. Here we allow all counting spheres while using equation fl3.25p . even those 
which cross the boundary or contain masks. The missing volume is accounted for 
by re-weighting the contribution of the (incomplete) counting sphere, essentially 
equivalent to filling it with a distribution of mock galaxies. The main problem is 
in deciding which distribution should be used to fill the void. Borgani et al. (1994) 
chose to weight each cell by a factor fi{r) which is determined by the missing volume. 

This at first glance may seem to be a valid choice. On the other hand the 

weighting factor should be, more correctly, proportional to some measure of the 

average density in the counting sphere. This highlights the potential problem with 

the capacity correction: even though it can, in principle, be applied to considerably 

larger counting spheres than the deflation correction, its form is fundamentally flawed 

since it is assuming an answer to the question being posed - i.e. that ~ r^, which 

^Cosmic variance is the cosmologist's definition of sample variance. If the scales being sampled 
are comparable in size to the survey, then only a few independent measurements can be taken, 
i.e. only a few values available for averaging. 
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Figure 4.3: A counting sphere centred on galaxy O with radius r. Regions Rl & R2 are 
inside and outside the survey respectively. 



is a statement of exactly the homogeneity that we are trying to test. 
In practise the capacity correction is applied using the expression, 

^'^ ^ ~ N,^t ~ N,^t ' ^ 

to obtain the corrected number of objects within radius r from a given particle. The 
RHS of eq. fi4.2p contains two main terms: the reduced number count, n,, and a term 
accounting for galaxies which are missing. This is done by calculating the missing 
volume of the survey and filling it with a density corresponding to the average density 
of the survey. 

To determine the missing volume of the sphere one can place random points 
within it, or equivalently one can shoot off vectors in random directions from the 
central point in the sphere, and count how many of these lie within the survey. This 
approach is known as a Monte Carlo technique. The expression for the missing 
volume, therefore, requires us to determine two numbers: 
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1. the number M of random vectors emanating from the central galaxy which fall 
outside the survey 

2. the total number M^ot of random vectors emanating from the central galaxy. 

Of course choosing to fill the missing portion of the counting cell with any pre- 
supposed density should be considered bad science. What we would ultimately like 
to do is fill the missing regions with the right amount of particles. To do this we 
need information which is hidden from us, behind the mask, but the Cosmological 
Principle may come to our rescue. 



4.3 Angular Correction 

In 2002 Pan & Coles, used the the assumption of isotropji^ to infer properties of un- 
known regions, masks and boundaries, from the well known survey space. Essentially 
they proposed that you can average over the part of the sphere which is observed 
and use this average to fill in for the missing regions. Of course this has to be done 
in an angular fashion, since ^(r) is assumed not to depend on direction. 

Assuming that the universe is statistically the same in all directions, they con- 
cluded that the number of galaxies in a given solid angle should be comparable to the 
number of galaxies in the same solid angle but in a different direction. The average 
density of galaxies per steradian can then be defined. In order to achieve this, they 
introduce the weighting factor fi{r) which has values < /j < 1. The value of the 
weight determines how much of the counting sphere is missing from the survey, and 
therefore needs to be accounted for. Then the corrected number of galaxies in a cell 



^The Isotropy of the universe is a cornerstone of modern cosmology so in this case it is not a 
bad assumption to make. 
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A 



Figure 4.4: This is a counting cell within the survey, it has a masked region (Rl) and a 
missing portion due to the intersection with the boundary (R3). The slices AOB and COD 
encompass both of these missing parts. 



is computed from, 



where is the selection function for a flux limited sample. The weighting factor 
appears in the above expression to increase the number count accordingly, just as 
the selection function does. 

To implement the angular correction one would start as usual and centre on a 
galaxy within the survey. At a given radius the counting sphere may contain a part 
of a masked region (indicated by region Rl in fig j4.4l) or the survey boundary (R3 
in fig j4.4l) . If this were to occur, the solid angles which contains these features are 
cut out (slice AOB & COD in fig j4.4l) and replaced by an average over the rest of 
the cell. As an example, for figure 14.41 the weighting factor in this 2-D analogy is 
/o = 1 — {9i + 92)/2n. For computational purposes, the method to calculate equation 
flOl) is as follows: 

1 From the set of all galaxies, choose a galaxy as the centre of a counting 
sphere of radius R and determine its position relative to the mask and 




(4.4) 
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boundary. 

2 'Shoot off' random vectors (r, 6, (p) from tlie centre, wliere: 

• r is sampled uniformly e (0, i?). it! - sphere radius. 

• is sampled uniformly e (0, 27r). 

• 9 — sm~'^{U), U is sampled uniformly from e (—1, 1). 

3 Determine which vectors lie outside the survey region. Any vector which 
does so, save its components. 

4 Using the scalar product expressed in terms of vector components, cal- 
culate the angle between each pair of vectors which lie inside the survey. 
This allows us to define a reference direction, and a maximum angle, 9max 
between vectors which lie inside the survey. {9max is the opening angle of 
the cone which points in the reference direction). 

5 Now begin counting those galaxies in the counting sphere which make an 
angle, 9 > 9max relative to our reference direction. 

6 The weighting factor, /i(r), is then simply related to 9max- 

This is essentially the algorithm developed by Pan & Coles (2002) which they 
tested on simulated catalogues and applied to analyse the fractal clustering of the 
IRAS PSCz survey. 

The angular correction, although in principle a very successful method to correct 

for boundary effects, is very slow and inefficient. Moreover, and worse still, it throws 
away potentially useful data. We can see this from Figure 4.4, where the galaxies in 
regions R2 and R4 are excluded by the angular correction. 

In the next section we consider a method which has the potential to improve 
further upon the angular correction. 
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4.4 Volume Correction 

The basic idea of our new, volume, correction can be illustrated in figure H31 Here the 
counting sphere has exceeded the geometrical boundary of the survey. The number 
of galaxies counted in the sphere of radius r is depleted which leads to Pi{r) being 
reduced through equation (13.251) . As we have seen, to solve this problem we could 
either add galaxies to the missing region, as is the case with the capacity correction, 
or or equivalently we could somehow modify our definition of the volume itself (hence 
the name for our new correction!). Of course you may notice that eq. (l3.25p does not 
contain any explicit reference to the volume, but we can cast this equation as. 



with V being the true volume of the sphere and V* being what we can term the 
reduced volume. We have also introduced the reduced density, p*{r), as an inter- 
mediary step which need not be calculated, and related this with a reduced volume 
and number count, n*. On its own this method can be visualised in figure (14.31) . as 
assigning to the missing region (R2) the same density as that of region (Rl). This 
would be wrong if density varies with distance, so that pri 7^ p/j2- To overcome this 
problem we assume only that the density does not vary with 6 or (p i.e. the universe 
is isotropic and hence equation (14.61) will hold for fixed r. So to apply this method 
to a galaxy survey we must count in spherical shells, correcting our estimate of the 
density in each shell as we go along, and then integrate up the shells at the end. 
This method is illustrated in figure (14.51) . The shells are individually corrected and 
summed according to. 




(4.5) 



(4.6) 




(4.7) 



r=0 
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Figure 4.5: A counting shell centred on galaxy O with radius r. Region Rl is inside the 
survey, R2 &; R3 are outside the survey. The missing parts of the shell (R2 & R3) are 
replaced by the average over the rest of the shell. 

where aj(r) = y^; this is the enhancement factor of the z*'^ shell at radius r and has 
value > 1. 

The main advantage of this method is that makes the maximum use of the data. 
Specifically, if the boundary edge cuts across a counting sphere at a particular radius, 
the method still makes full use of galaxies at smaller radii from the centre of the 
counting sphere, even if they lie in the solid angle subtended by the boundary edge. 

The deflation method and the angular correction, of Pan & Coles, on the other 
hand throw away a lot of potentially useful data, which limits the counting sphere 
radius within which the density may be reliably estimated. 

4.5 Practical Computing Issues 

One significant drawback faced when implementing the angular correction is that it 
must fill the counting spheres with random vectors to determine the missing regions. 
To obtain the required resolution means placing many random vectors at the centre 
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convergance plot 

for varying sampling deneily 




rinclcmvsct:i'deisity(Mp:~''; 



Figure 4.6: The convergence of 9max for increasing vector density. The graph seems to 
converge at a random vector density of ~ 0.25Mpc~^. 



of the sphere. For a cell with R 100 h~^Mpc we have found that this requires 
the generation of ~ 100, 000 random points. This is not a trivial computational task 
given that the vectors must be computed and angles stored at every step. 

To see where this comes from consider figure 14.61 Here a simulation has been 
set up to mimic what happens when computing the angular correction. A counting 
sphere is placed close to the survey boundary and 9max is calculated for varying 
numbers of random vectors. From the plot we can see the convergence of the opening 
angle at an approximate vector density of p^ec ~ 0.25/;,^Mpc~^. Taking this value 
we can make a back of the envelope calculation of the required number of random 
vectors to accurately constrain 9max at a typical scale of 100/i~^Mpc. The number 
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of vectors required is then, 

N{r) = 
N{r) = 
N{r = 100) 

To make the comparison with the new volume correction, the same calculation is 
performed. In this case however we are calculating the number of vectors required 
to populate a spherical shell with thickness dr — IMpc, as a typical value. 

V{r) Anr'^dr 

N{r) = A-npr'^dr 
N{r = 100) fa 30, 000 

The volume correction is definitely more computationally efficient, especially when 
you consider that the missing volume must be calculated for every galaxy and at 
every distance iteration. One should also bear in mind however that these codes 
could be used on much bigger surveys, SDSS is now pushing 1 million galaxies. 

4.6 Error Analysis 

Error estimation in fractal analysis has been largely swept under the carpet by many 
in this field. One of the reasons for this is the computational costs of using, e.g., a set 
of mock galaxy catalogues to estimate error estimates via a Monte Carlo approach. 
Nevertheless, we can obtain an approximate expression for the error on an estimate 
of the galaxy number density via some remarkably simple mathematics. 

We follow Grassberger and Procaccia's (1983) suggestion and use the partition 



Pvec-V{r) 

An 3 

Pvec-~^ 

100, 000 
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function Z{q, r), 



M 




.q-l 



(4.8) 



to estimate the generalised dimensions of a set of galaxies. Here pi is the proba- 
bihty, 

P. = ^, (4.9) 

of a cell, centred on the i^^ galaxy, having an occupation number nj. r is the cell size 
while r(g) is a scaling exponent. 



By construction the mean of the partition function is positive as there are always cells 
with galaxies within. However, there is a subtlety when handling astronomical data: 
in expression (14.81) we assume that any set contains an infinite number of elements 
and the size of the cells vanish. Redshift surveys, however, contain a finite number 
of objects in a finite volume. This will lead to configurations where none of the cells 
have galaxies other than the central one when limr 0. 

We can then construct the second moment of the distribution and rearrange as 
below. 



M 




(4.10) 



i=l 




(4.11) 




(4.12) 




(4.13) 



(4.14) 



65 



CHAPTER 4. BOUNDARY CORRECTIONS 



-E4 




Figure 4.7: This is a log- log plot of the partition function varying with distance. The larger 
error bars are estimated from the prescription discussed in § 14.61 whereas the smaller errors 
are obtained from averaging the results of 100 Levy Flight fractal simulations. This fractal 
has 1,000,000 particles with D2 = 1.2 and is contained in a AOOMpc^ box. left: The 
points in this plot have not been averaged, right: Points have been averaged over 100 
distributions. 



M ^ M M 



i=l i=l j=i+l 

Where (■ ■ ■ ) represents an ensemble average. Applying the Cosmological Ergodic 
Theorem to the above equation leads to sums over the cells. Thus, for q = q' the 
{Z{q)Z{q')) can be cast as 

{Z{q)Z{q)) = ^^{Pi^'^ ^'') + cross terms (4.16) 

i 

= {Z{2q — 1)) + cross terms (4-17) 

= y^((p-)^''~'^^) + cross terms (4.18) 

i 

i 

We can then estimate the standard deviation directly from, 

~ 2.Z{2q - 1) - Z{qf (4.20) 

We show a comparison between our error estimation, from eq. fl4.2UI) . and an error 
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estimation from 100 simulated Levy Flight fractals (see figure HTTl) . 

The left hand plot, of figure (14. 7p . shows the partition function calculated as a 
function of distance. To each point is attached a larger error bar estimated from the 
prescription discussed in §4.6l and a smaller error bar obtained via an average over the 
100 simulations. We can see from the plot that the gradient of Z is constant and the 
points are all very close to the fitted straight line. The fit was obtained by minimising 
the function. The right hand plot shows similar points but now the partition 
function calculated at each distance is also averaged over the 100 simulations, with 
error bars computed as before. The excellent agreement in the fitted slopes of the left 
and right hand plots indicates that the partition function is an unbiased estimator. 
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5 



Analysis 



&: Results 



We will begin in the proceeding section by applying the difTcrcnt corrections to 
a toy fractal model. This model can be used to compare and contrast the differing 
methods. Then the volume correction will be used to analyse in detail the distribution 
of particles produced from an A'"-body simulation. 



An initial test for the different corrections is to analyse a simple fractal distribu- 
tion, the Levy Flight. This fractal is very easy to construct and has an analytical 
determined dimension (see Meakin 1998). 

The Levy flight fractal is finding its way into many areas of physics due to its 
close connection to Brownian Random Motion. For example it has been used to 
explain Interstellar scintillation (Boldyrev & Gwinn 2003) and even modelling the 
financial market (Chowdhury & Stauffer 1999) 
The Levy Flight is constructed as follows: 



5.1 The Levy Flight 
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1 A point A is chosen at random, maybe the origin, in Cartesian space. 

2 A displacement is given to A by a vector {6,(f),R) to give a second point 
B. The angular direction is uniformly sampled and the Probability of R 
exceeding a value r is given by, 

1, r/ro<l ' ^^-^^ 



3 This procedure is repeated many times to 'fill' the 3-D space, which was 
restricted to a cubic box of side 400 Mpc. 



In expression (15.11) . D is the fractal dimension and tq is a characteristic scale 
length, both of which we can adjusted to produce different features. The resulting 
distribution is not so dissimilar to a true galaxy survey (c/. figure EH]), tq can be 
related to the average inter-cluster separation. 



5.1.1 Multifractal Analysis 

A multifractal analysis, as described in § 13.4.11 is performed on a Levy Flight dis- 
tribution of particles (c/. fig lS.ip . The boundary is corrected by considering each of 
the different methods from Ch.4. This setup should give a fair comparison of the 
different correction methods. 

Since the correlation dimension is known analytically, the multifractal analysis 
will be restricted to the D2 dimension. Now from the partition sum Z(2,r), equation 
(13.281) will provide the D2 value through differentiation. This differentiation was 
performed directly on the data by applying a linear fit to every three consecutive 
data points. The errors, as described in § 14.61 were considered when minimising the 

function. 

The results of this procedure are plotted in figure 15. 2[ There are a few points 
to note in this plot. Firstly, the Levy Flight is highly anisotropic, which is a cause 
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for concern when applying a statistic which does not consider angular information. 
Although this may be a problem, following the suggestion of Martinez et al. (1990), 
the correct result from the simulation seems to be confined to the range between 
the average nearest neighbour separation and the mean inter-particle spacing. In 
figure [52] it can be easily seen that the volume correction lies closer to the analytical 
answer of D2 = 1.2, than either the capacity or angular corrections. 

Secondly, the volume correction is more or less always closer to the true value, 
even as the methods begin to over estimate on larger scales. This over estimation by 
all the methods is due to the anisotropic nature of the Levy Flight distribution. 

Thirdly, the large errors and visual noisiness of the angular correction are not 
present in either the capacity or volume corrections. This, I can only conclude, is 
due to the angular method throwing away data, leading to low number statistics, 
i.e. intrinsic noise. 

5.2 ACDM Simulation 

In this section we will analyse the distribution of dark matter halos from a ACDM 
simulation. Since there are no galaxies in this analysis, it is only the underlying 
dark matter distribution which is being probed. This simulation was performed by 
Warren et al. (2006), see reference for a detailed description. For this work we are 
using a 384/i~^ Mpc box with a flat geometry and cosmological parameters, 

p = (Qm, ^b, n, h, as) = (0.3, 0.04, 1, 0.7, 0.9). (5.2) 

Initial conditions were derived from the transfer functions as calculated by CMB- 
FAST (Seljak & Zaldarriaga, 1996). The final catalogue has approximately 1.5 mil- 
lion halo positions. 
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5.2.1 Multifractal Analysis 

The multifractal analysis as presented in §3.4.H must be applied over a range of 
different distance scales. In figure fl3.5p the results of this analysis are shown over 
two different ranges; 10 < i?i < 40/;,~^Mpc and 50 < i?2 < lOO/i^^Mpc. These scales 
were chosen almost arbitrariljo. The left hand plot of fig. (13.51) shows a clear sign 
of multifractality on small scales whereas the right hand plot appears to signal a 
transition to homogeneity. However, since the partition function (like figure [3l6] (a)) 
is generally smooth, we can expect a smooth transition to homogeneity. 

Instead of applying this analysis in certain ranges, the plots in figure 13.51 could 
be extended by adding another axis: distance. This would give us a Dq R Surface. 



The Dq Fi Surface 

Figure 15.31 shows the result of extending the multifractal analysis to include vary- 
ing scales. The high peak and dip on low scales (< lO/i^^Mpc) corresponds to a 
multifractal distribution. The surface then levels off to a constant value of three ad- 
vocating a transition to homogeneity at a scale of ~ 30/i^^Mpc. Another interesting 
feature is that homogeneity is not reached at the same scale for all q. 



5.3 PSCz Mock Catalogue 

In the previous section the whole halo catalogue was used, thus the analysis is only 

relevant for the underlying dark matter distribution. In this section the galaxies 

are under investigation. The galaxy mock catalogue is extracted from the iV-body 

simulation (c/. § 15.3.11) . 

""^The only reason was that the partition sum, Z(q,R), seemed to have a constant gradient in 
these regions. 
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5.3.1 Making Mock Catalogues 

Mock PSOz catalogues have been extracted from the A^-body simulations performed 
by Cole et al. (1997). They used the AP^M code of Couchman (1991) loaded with 
192^ particles in a box of comoving size of 345.6 /I'^Mpc. The particle mass is 
im^W^^Mh-^MQ. Further details can be found in Cole et al. (1997). For the 
analysis performed in § [S] we have considered two different cosmologies (c/. Table 
15.3.11) : a flat model VLm = 0.3 and cosmological constant term, Q^c^/SHq = 0.7 and a 
critical density universe {Qm = 1-0) with power spectrum shape parameter, F = 0.25. 
The relevant details of the two cosmological models explored are summarised in the 
table below. 



Model 






F 




LCDM 


0.3 


0.7 


0.25 


1.13 


SCDMG 


1.0 


0.0 


0.25 


0.55 



Table 5.1: Cosmological Models. 



Ten different mock catalogues, which we will refer to as LCDMOi, SCDMGOi, 
i = 0, ...,9, have been extracted from each of the above models. 

Although in this work we have not created an A^-body code or extracted the mock 
catalogues from it, it is still worth discussing the procedure for doing so. 

• A population of particles with properties similar to those of the Local Group 
(LG) is identified. A LG-like observer is defined by implementing two ob- 
servationally based constraints: the peculiar velocity of the point must be 
vlg = 625 ± 25kms~^, and the particle must be located in a region for which 
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the fractional overdensity 6 averaged in a radius of 5 /i ^Mpc is in the range 
-0.2 <5 < 1.0. 

• A sphere of 120 /i"^Mpc radius is drawn around the LG-hke observer and the 
whole frame is rotated so that the motion of the observer is in the direction 
(/ = 276°, 6 = 30°), the direction of the LG pecuhar velocity with respect to 
the CMB frame (e.g. Wilkinson 1988). 

• A friends-of-friends algorithm is implemented to find galaxy clusters, (see Frenk 
et al. 1988 for details) 

• The number density of particles in the simulation is ~ 0.039 K^Vb^c^^ while 
the number density given by the PSOz selection function exceeds this density 
closer than some critical distance. Thus, the simulations are volume-limited for 
distances less than 10.9 h^^Mpc, where the PSOz number density (Saunders 
et al. 1999) matches the A^-body one. For distances greater than this the 
simulated surveys follow the PSOz number density. 

• A Monte Carlo rejection was used to choose particles according to the PSOz se- 
lection function (Saunders et al. 2000): 



The optimal parameters are listed in Table 15.3.11 For r < 10.9 /i~^Mpc the 
mock catalogues are volume-limited and thus 0(r) = 1. A random fiux con- 
sistent with the PSOz selection function is then attributed at each selected 
galaxy. 

• Despite the large sky coverage, PSOz is not a full-sky catalogue. Unsurveyed 
regions are present both at high and low galactic latitudes that need to be 
accounted for to properly reproduce the existing observational biases. All the 




(5.3) 
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a 13 


To 










0.53 1.90 


10.90 


86.40 



Table 5.2: Selection Function Parameters for Eq. (|5.3p 

galaxies which fall within masked regions have been rejected leaving the final 
sky coverage to be ~ 84% complete. 

The final mock catalogue contains the positions of the galaxies in redshift-space 
and their "observed" fiux. The galaxy redshifts are assigned by adding the line-of- 
sight component of the peculiar velocity to the recession velocity. 

5.3.2 Multifractal Analysis 

Figure 15.31 shows the -Dq,i? surface for the halo positions in an ideal and complete 
(400 /i~^Mpc)^ box. It has a standard ACDM cosmology with no galaxies, so only 
the underlying dark matter distribution was investigated. Overall, it is a very smooth 
surface, which tends towards homogeneity at scales > 30/i^^Mpc. A clear peak, at 
low g, and dip, at high q corresponds to multifractality for R 10/i~^Mpc. 

The same -Dg,R surface analysis is repeated, this time for the Mock galaxy cata- 
logues mentioned above. The main difference is, now there are a lot less particles to 
analyse, ~ 15, 000. However, given that we have 10 mock realisations to average over, 
the noise should not be much worse. Figure 15.61 shows the averaged -Dg,/? surfaces of 
the ACDM and the SCDMG cosmologies. 
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Dq^R Surface - ACDM 

There are some points we should note regarding the -Dg,/? surfaces of figure 15.61 The 



ACDM cosmology yields a plot quite similar to that of figure I5.3[ this should not 
be too surprising given that they have the same cosmological parameters. There is 
however clearly a few features on small scale where the two differ. In fig. fl5.6p at 
~ 10/i~^Mpc there is a multifractal feature peaking at Dq=_Q ^ 3.5. A little further 
out at ~ 20/i^^Mpc there is a second multifractal signature with a much higher 
peak -Dg=_6 ~ 6.5. Recalling that figure [5^ is an average taken over 10 realisations, 
it is interesting that the two multifractal features on small scales have not merged 
into one i.e. they are probably not statistical anomalies. It would be premature 
to suggest that these distinct features have any physical significance! but there is 
definitely room here for further study. 

On scales larger than 30/i^^Mpc the surface flattens to homogeneity, Dq 3. 
The white reference line shows the correlation dimension, D2, varying with scale. 
This is related to the derivative of the 2PCF. Visually this shows that all the infor- 
mation which the 2PCF can provide is but a very small part of the -Dq,_R surface. 



Dq^R Surface - SCDMG 

In the lower panel of figure [5T6| the Dq R surface is plotted for the SCDMG cosmology. 
It clearly shows on small scales a similar multifractal peak to that of the ACDM 
model. However, there seems to be only one very sharp distinct peak at ~ 20/;,^^Mpc 
and for g = 6, -Dg=6 ~ 11. Casting our mind back to §3.4. H this is telling us that on 
scales ~ 20/i~^Mpc void under-dense regions are more clustered. 

Looking beyond the first peak it can be seen that the surface does not flatten off 

^This same point has been made by Bernard Jones in many of his papers, most recently in the 
review article (Jones et al. ,2005). 
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like the ACDM model. In fact on scales beyond 30/i~^Mpc there is no transition to 
homogeneity in this universe, on scales up to the size of the simulation it is entirely 
multifractal. 

It is evident, therefore, that on large scale the surface can in principle 

discriminate between different cosmological models. 
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x[h/p:l 



Figure 5.1: This is a 2-D view of a 3-D Levy Flight distribution of particles, with param- 
eters: ro = 0.2, D = 1.2 
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Figure 5.2: D2 evaluated for a Levy Flight distribution of particles. The volume (red), 
capacity (blue) and angular (black) corrections are used to correct for the boundary of the 
simulation. The error bars are from our prescription as described in ^4.6[ The solid blue 
line corresponds to the analytically determined D2 value and the two vertical dashed lines 
represent the average nearest neighbour separation and the mean inter-particle spacing 
respectively. 



- 3.5 




Figure 5.3: This is a 3-d surface fit of Dq{r). The data used is the halo positions from a 
ACDM cosmology. Described in Warren et al. 2006. 
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Figure 5.4: Sky distribution of galaxies in the PSCz and in two A'^-body mock-catalogues. 
From the top to the bottom we illustrate mocks drawn from the LCDM and SCDMG 
cosmologies, respectively. The Aitoff projection is in Galactic coordinates. 
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Figure 5.5: Histogram of the radial number count, dN , in the PSCz survey. The sohd hne 
represents the expected number in each bin after the inclusion of the selection function, (/). 
i.e. dN = A-Kn(p{r)r'^ dr . 
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Figure 5.6: Dq^n surfaces for two different cosmologies. The white reference Hue corre- 
sponds to the correlation dimension, which is usually obtained from the derivative of 
the 2PCF. Top: A ACDM cosmology. Bottom: An SCDMG cosmology. The two white 
reference curves are very similar on small scales in that they give the usual value of D2 ~ 2. 
On larger scales, however, it is evident that the surface can in principle differentiate 
between different cosmological models. 
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Summary 

We have shown that our volume correction recovers the true fractal dimension, when 
tested with the extreme case of an anisotropic Levy Fhght. The increase in accuracy 
over Pan & Coles angular correction estimator is marginal, however the computa- 
tional load required is much less. This fact will become increasingly more important 
as red-shift surveys are now pushing 1,000,000+ galaxies. 

We present the D^ r surface as a possible unique descriptor of a discrete point 
distribution. Whether this is strictly true or not does not differ from the fact that 
there is much more information contained in our multifractal analysis than can be ex- 
tracted from the usual 2PCF. In fact to go from a 2PCF to our multifractal measure, 
is of no significant computational cost. 

The 2PCF in our formalism, is represented by the integral along the q = 2 line in 
figures (15. 6p & (15. 3p . It is clearly apparent that g = 2 is confined to a rather boring 
and flat part of the surface and generally for g > 1 the surface is very smooth and 
Dq tends towards a constant value without any interesting features. 

There has been a lot of interest, since the dawn of cosmological simulations, to 
apply statistics to the resulting distributions so that a comparison can be made with 
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real surveys. Much of this effort has involved the 2PCF (also Minimal Spanning 
Trees and other geometrically motivated descriptors). However it must be made 
clear that the trade off between statistical robustness and visual interpretation is 
of prime importance when quantifying structure. Neither should be favoured too 
heavily. 

As we have presented here, our Dq R surface gives varying weights to under dense 
(void) regions and over dense (clustered) regions, this obviously has a significant 
advantage over calculating higher order correlations. The major task now, regarding 
the Dq ji surface, is to make full use of it and to extracted as much information as is 
statistically possible. 

Our methodology as presented in §3.4.11 also has the possibility to contribute to 
parameter estimation. Especially in the area of Baryon Acoustic Oscillations (BAO). 
BAO's are difficult to measure accurately due to the low amplitude on scale above 
lOOMpc. To help increase the power in this range, clusters, instead of individual 
galaxies, could be used as they show a higher clustering amplitude. This technique 
has some drawbacks The biggest problem it faces is in accurately determining what is 
a cluster. Another technique for measuring BAO's would be to use the -Dg,R surface 
as we introduced in §5.2.11 Since this method treats dense and under-dense regions 
differently, it suggests that the usual 2PCF (or Z(2,r)) may not be the best suited 
to observing and measuring BAO's. In future work we hope to explore the use of 
the Dq ji surface cosmological tool. 
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